Robust Speaker Localization Guided by Deep Learning-Based Time-Frequency Masking
نویسندگان
چکیده
منابع مشابه
Learning the Time-delay Manifold for Robust Speaker Localization
We present an algorithm for high dimensional density estimation which is efficient (both computationally and statistically) when the distribution is concentrated close to a low dimensional smooth manifold. The algorithm uses several random projections to generate a hierarchical mixture of Gaussians which rapidly converges to the underlying manifold. We use this algorithm to perform robust estim...
متن کاملLocalization based stereo speech source separation using probabilistic time-frequency masking and deep neural networks
Time-frequency (T-F) masking is an effective method for stereo speech source separation. However, reliable estimation of the T-F mask from sound mixtures is a challenging task, especially when room reverberations are present in the mixtures. In this paper, we propose a new stereo speech separation system where deep neural networks are used to generate soft T-F mask for separation. More specific...
متن کاملRobust speech separation using time-frequency masking
A multi-microphone time-frequency speech masking technique is proposed. This technique utilizes both the timefrequency magnitude and phase information in order to estimate the Signal-to-Noise Ratio (SNR) maximizing masking coefficients for each time-frequency block given that the direction (or alternatively, the time-delay of arrival) of the speaker of interest is known. Using this masking algo...
متن کاملRobust digit recognition using phase-dependent time-frequency masking
A technique using the time-frequency phase information of two microphones is proposed to estimate an ideal timefrequency mask using time-delay-of-arrival (TDOA) of the signal of interest. At a signal-to-noise ratio (SNR) of 0dB, the proposed technique using two microphones achieves a digit recognition rate (average over 5 speakers, each speaking 20-30 digits) of 71%. In contrast, delayand-sum b...
متن کاملTime-frequency masking for large scale robust speech recognition
Time-frequency mask estimation has shown considerable success recently. In this paper, we demonstrate its utility as a feature enhancement frontend for large vocabulary conversational speech recognition. Additionally, we investigate how masking compares with feature denoising, which directly reconstructs clean features from noisy ones. We train a mask estimator that predicts ideal ratio masks. ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE/ACM Transactions on Audio, Speech, and Language Processing
سال: 2019
ISSN: 2329-9290,2329-9304
DOI: 10.1109/taslp.2018.2876169